home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Libris Britannia 4
/
science library(b).zip
/
science library(b)
/
PROGRAMM
/
TUTORIAL
/
0855.ZIP
/
ASMGEN.DOC
< prev
next >
Wrap
Text File
|
1980-01-01
|
26KB
|
634 lines
*****************************************************************
* *
* ASMGEN.COM - by J. Gersbach and J. Damke (Ver. 2.01) *
* *
* A program to generate cross-referenced assembly language code *
* from any executable file. *
* *
*****************************************************************
* PREFACE *
This program will generate 8086/87/88 assembly code text
that is compatible with the IBM Personal Computer Macro
Assembler from any executable diskette file up to 65,535
bytes. The output can be routed to the console or a disk-
ette file. A reference list may be generated separately or
embedded at the appropiate instruction counter address in
the assembly code.
Some manual touch up will be required before reassembly, but
nearly all the typing is done for you by ASMGEN and anything
questionable is marked with "??".
A file of sequential instructions may be resident on the
same diskette to indicat to ASMGEN which addresses contain
code, byted, words, or strings. This file may also include
instructions to assume segment register values or toggle the
output of assembley code text, generation of the reference
table, 8087 mnemonics, of the inclusion of embedded reference
information in the assembly file.
DEBUG may be used to browse through the executable file to
determine the starting locations of code and data to develop
the sequential instruction file. It is important to accu-
rately specify these locations for an accurate reference
tabel and minimum touching up of the ASM output text.
The number of references within the file determines the amount
of memory required since a reference tabel is built in
memory during the first pass. Disassembly is done from disk
and only one file sector is in memory at any given time.
Therefore memory size does not limit the size of the file
to be disassembled. 48K bytes of memory will be enough for
most programs but a few will need 64K or 128K. One diskette
drive is sufficient but two is more convenient.
* STARTING ASMGEN *
There are two ways to work with ASMGEN: either by using the
command menu or by calling ASMGEN with parameters.
Following are the descriptions of both options.
* USING THE ASMGEN MENU *
The program is invoked by typing: ASMGEN
You are then prompted for a file specification. Respond with
the name of the executable file from which you wish to
generate the assembly code. The executable file will normally
have an extension of .EXE or .COM. ASMGEN will check this
file spec for validity and then respond with a prompt that
includes a summary of the command letters indicating that
you may give it a command. The executable file contents
are not checked for valid code and ASMGEN will try to dis-
assemble text or compressed BASIC files and produce unintell-
igible assembly code.
The commands are:
X filespec This file spec replaces any previous executable
file spec. The usual file extension is .COM
or .EXE
EXAMPLE: X DATE.COM
A <filespec> The executable file is disassembled and the
assembly code is routed to the specified file.
The usual file extension is .ASM. If the filespec
is omitted, the output will default to the
console.
EXAMPLE: A DATE.ASM
R <filespec> The reference table is sent to the file
specified. The usual file extension is .TBL. If
the filespec is omitted, the output will default
to the console.
EXAMPLE: R DATE.TBL
Q The program is terminated and control returned to
DOS.
Each time a command has been executed, ASMGEN waits with a one
line prompt for the next command.
X <filespec>, A <CON>, R <CON> or Q ?
The default filespec for each command is shown in brackets.
Enter the next command of your choice as described above.
* USING ASMGEN WITH PARAMETER CALLS *
Up to three file specifications may be included when ASMGEN is
first called from DOS. The executable file's name is given
first, followed by specifications for the assembly and reference
table files.
EXAMPLE: ASMGEN DATE.COM, DATE.ASM, DATE.TBL
If a semicolon follows the last filespec, ASMGEN will exit to DOS
when the command has been executed. If no semicolon is entered,
ASMGEN will display the menu options described above and wait for
further input after executing the command.
EXAMPLE: ASMGEN DATE.COM, DATE.ASM;
If the filespec for the .ASM file and/or .TBL file is omitted,
ASMGEN will generate first the .ASM file, then a .TBL file using
the filename of the first filespec.
EXAMPLE: ASMGEN DATE.COM,,; creates DATE.ASM and DATE.TBL and
exits to DOS.
If only the reference table is desired, the dummy name NUL should
be entered in place of an .ASM filespec
EXAMPLE: ASMGEN DATE.COM, NUL, DATE.TBL
If only one filespec is given when the program is called, the
reference table is built in memory and then the menu options are
displayed for further commands.
EXAMPLE: ASMGEN DATE.COM
* PROGRAM EXECUTION *
The disassembly is done in two passes through the scource file.
On pass #1, the reference table is built in memory and the actual
output is generated during pass #2. Once the reference table is
established, it remains in memory until an X or Q command is
issued, and subsequent A and R command executions skip pass #1.
This saves a lot of time when the executable file is large.
Three contiguous data areas are built dynamically in memory
during pass #1. First is the compressed sequential instruction
list. Second is a list of pointers for .EXE files that point to
the locations of all relocatable variables in the program, also
arranged in numerical order. These are established before
reading any code. Third, the reference table is then built in a
higher area of memory as pass #1 progresses.
If all available memory in the program segment is filled before
the first two data areas are completed, ASMGEN will abort to the
command prompt. After the reference table is started, a shortage
of memory will produce the message "Reference Table Incomplete
Due to Insufficient Memory" and continue.
Ctrl-Break may be used at any time to interrupt a command in
progress.
* READING THE ASSEMBLY CODE FILE (.ASM) *
This file begins with a title taken from the executable file's
name and date followed by the current date (in brackets).
If not inhibited by the M switch in a SEQ file (explained
later), the macro library will appear next in the file.
Next will be a .RADIX 16 pseudo-op which tells the macro
assembler that all numbers are in hexadecimal form.
Then comes a header that indicates a starting value for the code
segment, stack segment, instruction pointer and the stack
pointer. The stack pointer is usually set to FFFF for .COM files
but may be somewhat less depending on available memory. These
values are passed by the linker for .EXE files.
The first ASSUME statement might come next. There is one
generated for each segment that begins with code. All segment
registers are designated according to the current set of ASSUMEs.
They will sometimes be incorrect, so all ASSUME statements should
be checked prior to re-assembly.
The disassembled output follows, terminated by an END statement
and the execution address. An ORG psuedo-op is included if
required.
The text is compatible with the IBM Macro Assembler and the
format is the same except for RETurns. To avoid the need for
PROCedure titles, special mnemonics are provided for all RET
instructions. These are defined in the macro library at the
beginning of the file. Only macros that are needed for the
current file are produced. The optional embedded commands that
make up the reference table enhance the readability of the file.
For very large files, this is sometimes undesirable and a
separate reference table is best.
When invalid instructions are encountered in code areas, they are
reproduced as byte values followed by "??". If a near jump is
defined previously in the code, and it is within range of a short
jump, a NOP instruction is inserted after the jump. The
executable file created with this .ASM file and the Macro
Assembler and Linker will then be the same length as the original
file. This makes it less important to differentiate between
labels and numeric constants since the label values and their
offsets within the file will be the same. The fundamental
problem of disassembly is in knowing if the original assembly
code defined a number as a label which changes as a function of
it's position or as a number that always remains the same. If
you make changes in the assembly code however, you must properly
specify all values. You might as well remove all NOPs at the same
time.
Labels are five characters long and begin with "L". Segment
labels begin with "S". The remaining characters are the current
instruction counter in hex form, thus making each label unique
and showing it's location in the original file. The instruction
counter is continuous throughout the assembly code without
resetting at segment boundaries. The segment labels are then in
byte as opposed to paragraph form. In those cases where a label
value is modified by an ASSUME statement, the original value is
included as a comment in the referencing instruction so that it
may be easily changed back if it was not intended as a location.
The word "Relocatable" is printed at the end of any line that
contains an ablolute paragraph value. These are values that DOS
modifies after loading but before executing a program. They are
used for loading segment registers that are sensitive to the
program location in menory. Relocatable values are not modified
by ASSUMEs. ASMGEN converts these numbers from paragraph to byte
values by multiplying them by sixteen so that they will fit
within the 16-bit instruction counter field. When the paragraph
value is negative or exceeds 0FFFH, it is left unchanged and a
warning (??) is issued on that line. When a program larger than
64K bytes is being disassembled, it should be divided into
smaller files.
All words are produced as labels, except when the "L" switch has
been enacted in the .SEQ file (explained later). The label name
indicates it's numeric value and, if it does not occur on an
instruction boundary, the name indicates it's position relative
to the current instruction pointer is given by an EQU statement.
Therefore the Macro Assember will assume that it is a location,
but it is easily changed to a constant since the value is given
in the label name. The word OFFSET precedes a label whenever it
is questionable whether it is a label or an immediate value. You
must decide which of the labels should be constants and which of
the constants should be labels, and change them accordingly.
When changing labels to numbers, be sure to append an "H" if the
number ends with a "D" or a "B" since the Macro Assembler will
otherwise assume that it is decimal or binary.
Bytes are always treated as constants. An optional switch may be
included in the .SEQ file (explained later) which enables numbers
instead of labels if all references to the value are data segment
and immediate operation types.
An effective procedure to follow in attempting to understand the
assembly code file is to look first for the message text area,
the input commands, and the simpler subroutines. Then add label
names to addresses in the .SEQ file (explained later) that
remind you of their purpose. Add comments to the labels. If
these names are well chosen, the larger routines eventually will
become clear. The embedded references are produced as labels so
they will retain their meanings as they are changed.
It is also helpful to spend some time studying the structure of
data areas. Vector tables, which are frequently used to control
the program's flow, reveal the program's structure very quickly.
If some routines do not have labels at the beginning, it is
usually because the code or tables that reference them (or the
segment register assumptions) are not properly defined in the
.SEQ file.
* READING THE REFERENCE TABLE (.TBL) *
A referencee is defined as a number that is referenced somewhere
in the program. It may be a program loaction or a numeric
constant.
A referencor is is defined as the address in the program from
which a reference is made to the referencee.
Each entry is composed of a referencEE followed by a list of
referencors. If more than one line is needed, additional lines
are indented to the first referencor position. The referencEE is
followed by an "S" if it includes references to the beginning of
segment. The referencor is followed by two letters, the first of
which represents the segment register that is implied or prefixed
in the referencing instruction. The second letter indicates the
type of operation on the referencEE. When the reference entries
are embedded in the assembly code, all values are preceded with
the letter "L".
-----------------------------------------------------------------
1st letter | 2nd letter
SEG REGISTER | TYPE OF OPERATION
-----------------------------------------------------------------
C code | J jump M modify - INC, ADD, etc.
S stack | C call I immediate - value or offset
D data | R read T test or compare
E extra | W write ? unknown or ESC instruction
| P port
----------------|------------------------------------------------
* WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
The sequential instruction file is a list of special instructions
to ASMGEN which the user creates. The file takes the form of a
list of hexadecimal addresses and single-letter instructions or
generation switches. If used, the .SEQ file must be on the same
diskette as the source file and have the same name as the source
file with an extension of .SEQ. Each instruction in the file
must be in one of the following formats:
addr command
or
addr command ;comment
or
addr command label comment
or
addr command label comment ;comment
"addr" represents the instruction pointer value. All addr values
must be in numerical sequence in the file.
"command" may be either a toggle switch or a generation
instruction.
"label" is optional and replaces the label generated for this
address with this non-blank string.
"comment" is optional and must be preceded by "label" unless the
dummy label "." is used. Everything following "label" is treated
as an address comment and will be printed in the ASM file behind
the generated instruction. The address comment may be up to 255
characters in length and should not contain a semi-colon.
";comment" is optional. Anything following a semi-colon in the
.SEQ file instructions is considered as a comment in the .SEQ
file only and is not added to the generated .ASM file.
"label" and "comment" are not allowed when a generation switch is
coded, but a ";comment" may be used to help clarify the .SEQ
file.
The .SEQ file is read into memory before the first pass starts.
The addresses and commands will be compressed, but "label" and
"comment" will be held in memory one to one. An effect of this
is that memory space required for dis-assembly increases with
each "label" and "comment" added to the .SEQ file.
* DESCRIPTION OF GENERATION SWITCHES *
THE VARIOUS TOGGLE SWITCHES ARE SET TO ON BY DEFAULT. Switches
may be toggled on and off at any point in the .SEQ
file/disassembly.
All options switches except /M and /H can be either toggled or
directly set by the user. A suffix of "+" turns the switch ON,
and a suffix of "-" turns the switch OFF. Switches encountered
in the file that have neither of these suffixes are toggled to
the opposite of their state at the time; ON switches are turned
OFF and OFF switches are turned ON.
/B - generate byte references
When ON, byte and word references are included in the reference
table. When OFF, only word references are generated.
/E - embedded references in ASM file
When ON, reference table entries are inserted in the text just
before the referencee's definition statement. When OFF, these
entries are not included with the disassembled text. The entire
reference table can be printed with the "R" command.
/F - 8087 mnemonics
When ON, ESC instructions are produced. When OFF, ESC
instructions are assumed to be 8087 instructions and 8087
mnemonics are produced.
/H - append hex "H"
When this switch appears at any point in the .SEQ file, an "H" is
appended to all hex numbers. This does not, of course, apply to
the labels which are hex values preceded by the letter "L". The
.RADIX 16 pseudo-op is omitted which allows the assembler's radix
to default to decimal. This switch defaults to NO H APPEND.
Note that it will be set only once. It retains it's value until
the next .SEQ file is read.
/L - generate label or number
When ON, all word references are treated as labels. When OFF, a
word reference is treated as a constant if all referencors are
data immediate types.
/M - suppress macro library
When this switch appears at any point in the .SEQ file, no macro
library is included in the text output. The DEFAULT IS THAT THE
MACRO LIBRARY WILL BE INCLUDED. Note that this switch will be
set only once. It retains it's value until the next .SEQ file is
read.
/O - control ASM output
When ON, ASMGEN will output the generated text. When OFF, output
will be suppressed.
/R - control TBL output
When ON, ASMGEN will output the generated reference data. When
OFF, the reference table is not printed.
/T - control trace output
When ON, up to 16 bytes of object code are included as comments
in each line of the assembly code file. When OFF, object code is
not included.
* DESCRIPTION OF .SEQ FILE COMMANDS *
A - assume
The following lines contain ASSUMptions for segment register
values. They become effective at the address specified by this
instruction and may be modified anywhere in the disassembly. The
required format for assumptions is:
& 0400 DS
The ampersand indicates a continuation of the A instruction.
In this example, a data segment beginning at a instruction
pointer value of 400 will be assumed until another A
instruction changes it. CS, ES, and SS are also supported. The
segment assumptions are used for effective address calculations
only. The code segment assumption does not affect the
instruction pointer value.
B - bytes
The bytes encountered in the source file are assumed to have
meaning as single byte values.
C - code
The bytes encountered in the source file are assumed to be valid
8088 machine language instructions.
D - generate data operand
The operand of the instructions is changed to immediate data.
Subsequent bytes are interpreted as "C" (code follows).
I - initial value for IP
The hexadecimal value on this line overrides the instruction
pointer value at the beginning of the file - not to be confused
with the address at which execution begins. The default values
are 0000 for EXE files and 0100H for COM and other files. The
execution address following the END statement is omitted if this
option is invoked.
S - strings
The bytes encountered in the source file are assumed to form
text. Quoted text is produced for valid ASCII characters and
byte values for others.
# - defined length strings
The first byte encountered in the source file contains the length
of the character string which begins with the next encountered
character. This length value may be overridden by a subsequent
SEQ file instruction.
$ - defined length strings
The first byte encountered in the source file contains the length
of the character string which begins with the next encountered
character plus the length byte itself. This length value may be
overridden by a subsequent SEQ file instruction.
W - words
Pairs of bytes encountered in the source file are assumed to have
meaning as word values.
X - repeating data structure
A cyclic data structure is assumed to begin at the specified
instruction pointer value. The structure definition may follow
and is prefixed by an ampersand (&) to indicate the continuation
of this instruction. If the definition does not follow, then the
most recent definition is used. If no structure is yet defined,
then an error message is displayed.
The following elements may be used to define the structure:
& NNNN S - The next NNNN bytes are defined as string characters
& NNNN B - The next NNNN bytes are defined as byte values
& NNNN W - The next NNNN bytes are defined as word values
& XXNN $ - The next sequence of bytes is defined as NN fields.
Each field consists of a length byte and a string of
characters. The length of each field is contained in
the first encountered byte. The high nibble (XX), if
non-zero, is a bit mask of the length field within
the byte. The length field is right-justified within
the byte after the byte value is sent to the output
file.
* EXAMPLES OF .SEQ COMMANDS *
This example .SEQ file shows all the possible instructions in the
appropriate format.
;All switches are on at the beginning.
0 /T ;no object code as comments in output
0 /M ;no macro library in output
0 /H ;append "H" to all numbers
00H /A ;assume the following segment values
;Note that the ampersand (&) indicates the extended ASSUME
& 380 DS ;the data segment starts at 380 hex
& 380 ES ;the extra segment starts at 380 hex
0200 I ;initialize the instruction pointer to 200
0200 /F ;introduce 8087 mnemonics (not ESC)
0200 /E ;no embedded references
0200 C ;code begins at 200
0203H W ;words are at 203
0207 C ;more code starting here
220 X ;complex data structure begins here
& 3 W ;words
& 1 B ;byte
& 0E02 $ ;2 strings starting with the 2nd byte follow
;bits 3,2,1 of the first byte contain the length
;of the string including the length byte.
;the high nibble (0E) is the mask.
;see also # in summary below
& 1 B ;byte
;the structure repeats until 351
351 B ;bytes
358 C ;more code
380 S ;strings - list of messages
421 W ;words
4FD /B ;no further byte references
502 /R ;garbage here - turn off reference generation
502 /O ;and output
600H /O+ ;valid code - turn output back on
600 /R
600 C
1A60 /O- ;output file about to fill diskette - turn output
;off but keep scanning for references. another
;run will be needed to get the remaining code.
1B00 /D ;treat operand as immediate data
1DFD /B+ ;continue with byte references
1F45 W user_prt ;user provided labels will
2256 S $MSG ;translate to upper case
Comments may be included if preceded by a semicolon.
Alphabetic characters may be either upper or lower case.
An "H" may follow the hex address.
* SAMPLE SESSION *
The external command CHKDSK.COM will serve as an example for this
sample session because it is short. The .SEQ file is also short
and easy to generate. Only these few instructions are needed.
0100 /T ;include object code as comments in .ASM file
0100 /E ;simpler output without references
04F7H S ;messages
04F7H /H ;append "H" to numeric values
Using DEBUG, browse through CHKDSK.COM to see how this was
arrived at. Usually, but not always, the best procedure is to
assume code. If the code appears unintelligible, display it in
hex/ASCII. If it is not text, assume bytes. Label positions in
the first disassembly may indicate that some locations should be
words. Next, generate the .ASM file by typing
ASMGEN CHKDSK.COM <enter>
A <enter>
The assembly code can be viewed on the screen. Then type
A CHKDSK.ASM <enter>
to save the assembly source code to a file. Then,
R CHKDSK.TBL <enter>
to save the cross-reference table to disk.
The Macro Assembler, Link.exe and Exe2bin could now be used to
assemble CHKDSK.ASM, link it to .EXE and convert it to a .COM
file. No modification should be necessary in this case.
If working with code that is to be modified, the symbol types
must be correctly specified as locations or as constants. If
they are constants, place them outside of any segment. The label
names may then be changed to make the code more readable.
EndOfFile
ssssssssssssssssssssssssssssssssssssssss